Clustering high-dimensional data について

Words near each other

・ Cluster straddling
・ Cluster theory
・ Cluster, Pleasants County, West Virginia
・ Cluster-aware application
・ Cluster-expansion approach
・ Cluster-weighted modeling
・ Clusterball
・ Clustered file system
・ Clustered planarity
・ Clustered web hosting
・ ClusterFlunk
・ Clusterin
・ Clustering
・ Clustering (demographics)
・ Clustering coefficient
・ Clustering high-dimensional data
・ Clustering illusion
・ Clustering of self-propelled particles
・ ClusterKnoppix
・ Clusterpoint
・ Clusters School of Digital Arts
・ Clustrix
・ Clutag Press
・ Clutch
・ Clutch (band)
・ Clutch (Clutch album)
・ Clutch (disambiguation)
・ Clutch (eggs)
・ Clutch (G.I. Joe)
・ Clutch (literary magazine)

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Clustering high-dimensional data ：ウィキペディア英語版

Clustering high-dimensional data

Clustering high-dimensional data is the cluster analysis of data with anywhere from a few dozen to many thousands of dimensions. Such high-dimensional data spaces are often encountered in areas such as medicine, where DNA microarray technology can produce a large number of measurements at once, and the clustering of text documents, where, if a word-frequency vector is used, the number of dimensions equals the size of the vocabulary.
==Problems==
Four problems need to be overcome for clustering in high-dimensional data:
* Multiple dimensions are hard to think in, impossible to visualize, and, due to the exponential growth of the number of possible values with each dimension, complete enumeration of all subspaces becomes intractable with increasing dimensionality. This problem is known as the curse of dimensionality.
* The concept of distance becomes less precise as the number of dimensions grows, since the distance between any two points in a given dataset converges. The discrimination of the nearest and farthest point in particular becomes meaningless:
::

\lim_ \frac = 0

* A cluster is intended to group objects that are related, based on observations of their attribute's values. However, given a large number of attributes some of the attributes will usually not be meaningful for a given cluster. For example, in newborn screening a cluster of samples might identify newborns that share similar blood values, which might lead to insights about the relevance of certain blood values for a disease. But for different diseases, different blood values might form a cluster, and other values might be uncorrelated. This is known as the ''local feature relevance'' problem: different clusters might be found in different subspaces, so a global filtering of attributes is not sufficient.
* Given a large number of attributes, it is likely that some attributes are correlated. Hence, clusters might exist in arbitrarily oriented affine subspaces.
Recent research indicates that the discrimination problems only occur when there is a high number of irrelevant dimensions, and that shared-nearest-neighbor approaches can improve results.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Clustering high-dimensional data」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース